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ABSTRACT 

Recent research in syntactic complexity has described 
and specified structural characteristics that distinguish levels of 
syntactic density. A computer program, written in Programmer Language 
I for the IBM 370 at Pennsylvania State University , has been 
developed to apply a syntactic density formula to samples of natural 
language and provide a single guantitative score. The program is 
being used to assess syntactic density as a factor of readability in 
samples of graded reading material and to measure stages in 
children's language development as evidenced in written language 
samples from children in elementary school. (Author) 
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COMPUTER APPLICATION OF A SYNTACTIC DENSITY MEASURE 

It is enq^lrlcally obvious that there are varying degrees of complexity 
In syntax In different levels of graded reading materials and In the oral 
and written language of children at different levels of development.. Research 
In language education and In language development could be facilitated by 
using the computer to analyze and measure syntactic density (or syntactic 
complexity or syntactic maturity.) 

One of the characteristics of chlldren*s language development Is that 
with Increasing maturity, children use more and more complex strractures. 
Research has shown (Hunt, 1965, 1970; Loban^ 1963, 1970; O'Donnell, Griffin, 
and Norrls, 1967) that even after the pre-school period of rapid language 
acquisition, students of elementary and secondary school ages continue to 
develop abilities to manipulate language by eiq>lcylng more conq>llcated syn** 
tactic structures* Much of the recent research in syntactic development (Hunt, 
1970; Loban, 1970; Golub and Frederick, 1971) has been aimed at discovering, 
describing, and specifying those characteristics of syntax that distinguish 
degrees of complexity, maturity, or density of syntax. Out of this research 
have come some instruments that provide single, quantitative scores to dls- 
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tlngulsh between levels of syntactic density. 

When any of these Instruments Is applied to language samples » hand 
tabulation Is tedious, time-consuming, and subject to the inconsistencies of 
human error. Some of the Instruments require that the analyst have some 
degree of sophistication la linguistic analysis. The cost. In time and train- 
ing, of hand analysis Inhibits research designs that require analysis of 
large sauries of natural language. Significant research findings in many 
possible studies of language development would req^ilre sizable samples of 
text. Analysis of syntactic density could also be useful for comparison of 
stylistic characteristics of speakers or authors and for assessment of . 
syntactic load as a factor of readability. 

Since research (Golub, 1971) hai3 identified and described specific syntac-* 
tic features that indicate Increased degrees of syntactic density, the next 
logical step seems to be to program the computer to apply the instruoents to 
language samples for fast, efficient, and consistent results. 

Chomsky repeatedly asserts that language performance cannot be equated 
with language competence. At best, measurement of performance can only give 

dication of competence. Yet, realizing that we cannot discover every- 
thing about idealized language competence, we should not be prevented from 
learning about improved methods of measuring performance* 

As the child learns to put words together in meaningful relationships he 
is developing a grammar that enables him to generate an increasing variety of 
unique sentences. Through his own processes of observation, classification,' 
hypothesis-making, and hypothesis-testing, he moves more and more toward the 
adult model of his language community. In the early stages of language dev-* 
elopment, surface structures and deep structures are relatively isometric, 
and utterance units are frequently kernel- like units. As conceptual ability. 
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vocabulary^ and relational abilities develop along with increasingly powerful 
rules for senteiice formation and transformation^ the map from surface structure 
to deep structure becomes more complicated. What might have been several 
communicatlcm units before becomes a single unit^ mo^e fully packed with 
meanings which are manipulated by more conqplicated syntactic structures. 
The syatactic density increases. 

To allow quantitative coicq;>arlsons of syntactic coo^lexlty in ^iamples of 
language^ an agreed-upon baf;lc unit of comparison has had to be found. Early 
research (LaBrant^ 1933) found the sentence too subjective a meoBure. Hunt 
has defined the T-unit word lengthy a main clause with all of its subordinate 
clauses » as a more reliable measure. Subsequent research (Huvit^ 1970; 
0*Donnell^ 1967) has substantiated the reliability of the T-unlt measure and 
found that with Increasing maturity Trunlt length tends to increase. 

Within the last fifteen years research has contributed valuable informa- 
tion about the types of structures and amount of their use in the oral and 
Written language of children at various ages. Some of these studies were 
conducted before the advent of trans foi^matlonal grammar. Some have dealt 
transformationally with limited age ranges. Among the measures that Hunt 
found to be signficantly related to Increased syntactic maturity were: T-unit 
length, subordinate clause length, and reductions to less than a predicate. 
O'Donnell, Griffin, and Ncixis (1967) found that T-unlt length, number of 
sentence-coKobining transformations, and deletion transformations contributed 
substantially to structural con^lexlty when oral and written samples of lan- 
guage from children in grades 3, 5, and 7 were analyzed. 

Through a series of studies of children's oral and written discourse, 
Golub has developed a Syntactic Density instrument that tabulates the occur- 
rences of specific linguistic structures that correlate with teachers' 
Judgments of writing samples. In an early stage of the study, a sixty-three 



linguistic variables were listed. Multivariate analysis isolated the ten 
variables that most highly correlated with teachers^ high ratings. Canonical 
correlation assigned a relative weight to each variable according to the 
degree of its contribution to "syYAtactic density." The resulting Tabulation 
Sheet for a SYNTACTIC DENSITY SCORE provides for a calculation. When the 
variables are counted and weighted, the products are added. The total is 
divided by the nuniber of T'-unlts in the sar^^le to arrive at a single syntac- 
tic density score. The variables included in Golub*s fonaula reflect struct*- 
ures that have been identified in linguistic theory as being complex structures. 
Measures of mean main clause length and mean subordinate clause length are 
combined with measures of these other types of complexities. 

Insert Figure 1 about here 

Golub's formula for measuring syntactic density has been selected as 
the instrument to be programmed for the computer. It incorporates the measures 
of T-unit length and subordinate clauise length that Hunt and others have 
found useful. It also reflects co^lex verb expansions » use of some advanced 
structures of time, and reductions or embeddings that take the form of pre- 
positional phrases. Hand- tabulation by Golub*s formula is rather time con- 
suming and requires some training for the rater. 

A program for use on an IBM 370 computer has been written in PL/1 by 
Carole Kidder to apply the formula to seniles of natural language. 
Encoding Conventions for Data 

Text to be analyzed by the computer program must be keypunched, or 
typed on a Remote Job Entry, in blank-delimited form in columns 1 to 72. 
This means that each word and syntactic punctuation mark must be preceded 
and follov/ed by at least one blank. Lexical punctuation, such as in hyphenated 
words or In abbreviations, is not separated from its associated character 
itring by blanks. Multiple blanks are ignored. Quotation marks surrounding 
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SYNTACTIC DENSITY SCORE 
Tabulation Sheet 



Variable 
Number 



1. 
2. 
3. 
4. 

5. 

6 . 

7. 
8. 

9. 

10. 



Vari able 
Description 

Total number of words 

total number of T-units 

Words/T-unit 

Subordinate clauses/T-unit 

Main clause word length (mean) 

Subordinate clause word length 
(nean) 

Number of Medals (will, shall, 
can, may« must, would } 

Number of Be and Have forms 
in the auxiliary 

Number of Prepositional Phrases 

Number of Possessive nouns 
and pronouns 

Number of Adverbs of Time 

(when, then, once, while...} 

Number of gerunds, participles, 
and absolute phrases (unbound 
modifiers) 



Variable 
Loading 



Frequency 



VLXF 



.95 
.90 
.20 
.50 

.65 

.40 

.75 
.70 

.60 

.85 



X 
X 
X 
X 

X 

X 

X 
X 

X 

X 



Total 



SDS 

S.D. Score (Total/No. of T-units) 



Grade Level Conversion 



Grade Level Conversion Table: 

SD S .5 1.3 2.1 2.9 3.7 4.5 5.3 6.1 6.9 7.7 8.5 9.3 10.1 10.9 
Grade 



Level 1 
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10 



11 



12 



13 



14 
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conversatlon zaay be omitted* At the end of each paragraph* the final 

mark of punctuation must be doubled. To separate samples, at the end of each 

sample three dollar signs ($$$) must appear in columns 1 to 3*. 

The first Job step In the p7:ogram Is an Indexer, that picks off the 
Individual words of all tha text to be analyzed and puts them In a structure 
that Indexes each one according to linear nu::nber with respect to the entire 
text, word In sentence, sentc^nce number, paragraph number, author number, 
and sample group number.^ The output record of the Indexer Is stored on a 
temporary systems disc to be fed Into the next job step, the Analyzer « 
Analyzer 

The Analyzer step of the job begins by giving the computer some ref^ 
erence lists, or dictionaries, and some decision-making capacity. Stop lists 
are Initialized into the program so that the computer can reference and cross 
reference coordinating conjunctions, subordinating conjunctions, relative 
pronouns, modals, forms of "have** and "be**, prepositions, possessive pronouns » 
and adverbs of tine. A slx-by- twelve transition matrix, deisigned by Paul 
Schuepp of Pennsylvania State University^ houses the decision power* The 
accompanying diagram (Figure 2) displays a conceptualization of the Transition 
Matrix. Figure 3 is a summary of the Transition Matrix routines that control 
the program. 



Insert Figure 2 and Figure 3 about here 



The indexed text is brought into core one sentence at a time. Each word 
or syntactic punctuation mark is analyzed by the matrix. Punctuation, stop 
lists, and their interrelationships are examined until a decision can be 



^This indexer is a modified version of the Index feature of John Smith's 
RATSCAN (1972). 
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Figure 3 

SYNTACTIC DENSITY ROUTINES SUMMARY 
FROM TRANSITION MATRIX 

1. ENIKT: End of a T-unlt is encountered. Calculate raaln clause word count* 

Inltiallsse other variables, and read in the next sentence. Then go to GBIW. 
2« GNW: Get next word in sentence to determine class It is in. Then determine 

state. Branch to that entry in matrix and perform that routine. 
3, SC: Check for subordinate conjunctions longer than one word (in order thaty 

so that, provided that) and for those that cross'-reference with prepositions. 
4« SERIES: Flag items in series. If a coordinating conjxmction appears next, 

do not let It flag a compound sentence. 
5. CS: If item3«*ln-a<* series has not been flagged, a compound sentence has probably 

been encouiitered. Increment T-unit count. 
6« FOR; If the coordinating conjunction is for > since it did not follow a comma, 

for mtust be a preposition here. Increment preposition count. 
7« MARK- SUB: Since three words have followed the subordinate conjunction and 

no pxmctuatlon has been encountered, mark as a subordinate clause and increment 

subordinate clause word count by 3. Increment subordinate conuunction count 

by 1. 

8. A question mark is encountered. Check to see if the first word In the 
sentence is a relative pronoun. If so, cancel subordinate clause markers. 

9. VERBAL: Check for words ending in "^ing ^ -ed , or zSFL» that have more than 6 
characters and that do not have a form of have or be within the preceding 

3 words. If all 3 conditions prevail. Increment verbal count. 

10. POSSES: Check to see if the word Is one of the possessive pronouns or ends 

"'s or ^-s' . If one of these conditions is met, increment possessives count. 

11. HAVE-BE: Check to see If the word is a form of haveor be. If so, increment 
O ve-be count. 

ERIC 



12. MODALS; Check the reference list to see If the word Is a modal. If so. 
Increment modal count. 

13. TIME: Check to see If t^e word Is on the list of adverbs of time. If so. 
Increment adverbs of time count* 

14. ERR 1: Print out the sentence for evaluation. 

A possible punctuation error has been encountered. 

15. ERR 2: Print out the sentence for evaluation. 

A possible undefined transition matrix entty may be causing an error. 
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made abowt what routine should be called. The routines process other program 
algorithms and Implement tabulation, flagging, or computation. 

For most variables and for all computations, the nachine scoring has been 
found to be more accurate than hand scoring. Many of the decisions to be 
made by the nachine are quite deterministic. Counts of possessive pronouns, 
modals, and words, for instance, can be definitely and easily decided. For 
more complicated decisions, program algorithms check series of conditions to 
be met before a decision is made. A few of the decisions are probabilistic. 
Occasionally one of the probabilistic decisions might be discovered to be 
erroneous, but repeated analysM .reveal that the program is consistent and 
predominantly accurate. The printout shows the text being analyzed and a 
tabulation sheet for each sample that lists the frequencies and subscores on 
each of the linguistic variables and gives the computed Syntactic Density 
Score. A Grade Level Conversion Table is also displayed on each Tabulation 
Sheet. 

To compare the machine scoring with hand scoring, twelve 200-word samples 
of graded reading material were scored by a trained rater and carefully 
checked by a second rater. The same samples were then scored by the computer. 
The Pearson Product-Moment Correlation of the ''hand*' and *'machine" analysis 
was .96, with the machine scores nmnlng consistently slightly higher than 
the hand scores. Golub's original formula calls for a count of forms of 
have or be used in the auxiliary position. The computer counts all occurrences 
of forms of have or be. This difference in hand tabulation and computer 
tabulation accounts for some of the slightly higher score when machine and 
hand scores are compared. 

Sixty 200-word samples of graded reading material (ten samples at each 
grade level from second to seventh grade) were analyzed by the program to 
determine whether there Is a significant difference in syntactic density in 

ERIC 
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materials dtislgned for dlffdrent grade levels* Ihe Syntactic Density Increases 
at •ach grade level* The differences between grade levels were statistically 
significant at yearly Intervals In two of the five Intervals (g, ^.OS). 
When differences were examined by two*-year IntervalSy statistical significance 
was found at every Interval (£ < .05), 

An analysis of written language of children In first through sixth grade 
is now in progress to test the discriminatory power and range of the instrument* 
The computer program is also being used in a study of the effects of eltemate 
types of learning experiences on children's written compositions, in a project 
to coordinate reading levels of selected and directed reading materials for a 
Pennsylvania school districtt and in preparing some materials for testing and 
research in content areas* 

The Syntactic Density coirqputer program should be a useful Instrument for 
further experimental investigations, for assessing readability levels, for 
development of materials, for diagnostic purposes, and for stylistic analysis. 
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